Today we will…
Artwork by Allison Horst
Which data follows a tidy data format?
Artwork by Allison Horst
.csv : “Comma-separated”
Name, Age
Bob, 49
Joe, 40
.xls, .xlsx: Microsoft Excel Spreadsheet - Common approach: save as .csv - Nicer approach: readxl package
.txt: Plain text - Could be just text - Could be comma-separated data - Could be tab-separated, bar-separated, etc. - Need to let R know what to look for
The tidyverse has some cleaned-up versions in the readr and readxl packages:
read_csv() works like read.csv, with some extra stuff
read_tsv() is for tab-separated data
read_table() is for any data with “columns” (white space separating)
read_delim() is for special “delimiters” separating data
read_excel() is specifically for dealing with Excel files
Think of a data visualization or graph as a mapping
It’s not just a neat party trick!
Note
“[The grammar] makes it easier for you to iteratively update a plot, changing a single feature at a time. The grammar is also useful because it suggests the high-level aspects of a plot that can be changed, giving you a framework to think about graphics, and hopefully shortening the distance from mind to paper. It also encourages the use of graphics customised to a particular problem, rather than relying on generic named graphics.
GoG components, as specified in R’s ggplot2
dataaes : aesthetic mappings (position, length, color, symbol, …)geom : geometric element (point, line, bar, …)stat : statistical variable transformation (identity, count, linear model, quantile, …)scale : scale transformation (log scale, color mapping, axes tick breaks, …)coord : Cartesian, polar, map projection, …facet : divide into subplots / small multiples using a categorical variableOf course, we can also control axes, legends, titles … (guides)
ggplot2In ggplot2, we map variables from the data set to aesthetics on the chart
Not an exhaustive list – see ggplot2 cheat sheet
Global Aesthetics
Local Aesthetics
In ggplot2, we use a geom function to represent data points, and use the geom’s aesthetic properties to represent variables.
Not an exhaustive list – see ggplot2 cheat sheet
one variable
geom_density()geom_dotplot()geom_histogram()geom_boxplot()two variable
geom_point()geom_line()geom_density_2d()three variable
geom_contour()geom_raster()Once our data is formatted and we know what type of variables we are working with, we can select the correct geom for our visualization.
A stat builds a new variable to plot (e.g., count and proportion)
A way to extract subsets of data and place them side-by-side in graphics
Note
sometimes called small multiples
facet_grid(. ~ b): facet into columns based on bfacet_grid(a ~ .): facet into columns based on afacet_grid(a ~ b): facet into both rows and columnsfacet_wrap( ~ fl): wrap facets into a rectangular layoutYou can set scales to let axis limits vary across facets:
facet_grid(y ~ x, scales = "free"): x and y axis limits adjust to individual facets
You can also set a labeller to adjust facet labels:
facet_grid(. ~ fl, labeller = label_both)facet_grid(. ~ fl, labeller = label_bquote(alpha ^ .(x)))facet_grid(. ~ fl, labeller = label_parsed)Position adjustments determine how to arrange geoms that would otherwise occupy the same space
position = 'dodge': Arrange elements side by sideposition = 'fill': Stack elements on top of one another, normalize heightposition = 'stack': Stack elements on top of one anotherposition = 'jitter": Add random noise to X & Y position of each element to avoid overplotting (see geom_jitter())Clearer labels with labs()
Tip
Notice how there is a lot of nesting that happens within ggplot2 code (e.g., parentheses within parentheses). It is good practice to put each geom and aesthetic on a new line. This makes code easier to read!
The general guideline is that each line of your code should not be over 80 characters long.
Artwork by Allison Horst
Tip
I encourage you to use your neighbors for support!
Note
I have office hours TODAY, Tuesday (1/17) from 2:40pm - 3pm in 25-103
Today we will…
Lab 2: Exploring Rodents with ggplot2
Challenge 2: Spicing things up with ggplot2
Read Chapter 3: Data Cleaning and Manipulation